The Protein Sequence Design Problem in Canonical Model on 2D and 3D Lattices
نویسندگان
چکیده
In this paper we investigate the protein sequence design (PSD) problem (also known as the inverse protein folding problem) under the Canonical model 4 on 2D and 3D lattices [12, 25]. The Canonical model is specified by (i) a geometric representation of a target protein structure with amino acid residues via its contact graph, (ii) a binary folding code in which the amino acids are classified as hydrophobic (H) or polar (P), (iii) an energy function Φ defined in terms of the target structure that should favor sequences with a dense hydrophobic core and penalize those with many solvent-exposed hydrophobic residues (in the Canonical model, the energy function Φ gives an H-H residue contact in the contact graph a value of −1 and all other contacts a value of 0), and (iv) to prevent the solution from being a biologically meaningless all H sequence, the number of H residues in the sequence S is limited by fixing an upper bound λ on the ratio between H and P amino acids. The sequence S is designed by specifying which residues are H and which ones are P in a way that realizes the global minima of the energy function Φ. In this paper, we prove the following results: (1) An earlier proof of NP-completeness of finding the global energy minima for the PSD problem on 3D lattices in [12] was based on the NPcompleteness of the same problem on 2D lattices. However, the reduction was not correct and we show that the problem of finding the global energy minima for the PSD problem for 2D lattices can be solved efficiently in polynomial time. But, we show that the problem of finding the global energy minima for the PSD problem on 3D lattices is indeed NP-complete by a providing a different reduction from the problem of finding the largest clique on graphs. (2) Even though the problem of finding the global energy minima on 3D lattices is NP-complete, we show that an arbitrarily close approximation to the global energy minima can indeed be found efficiently by taking 4 The Canonical model is neither the same nor a subset of the Grand Canonical (GC) model in [19, 24]; see Section 1.3 for more details. appropriate combinations of optimal global energy minima of substrings of the sequence S by providing a polynomial-time approximation scheme (PTAS). Our algorithmic technique to design such a PTAS for finding the global energy minima involves using the shifted slice-and-dice approach in [6, 17, 18]. This result improves the previous best polynomial-time approximation algorithm for finding the global energy minima in [12] with a performance ratio of 1 2 .
منابع مشابه
The inverse protein folding problem on 2D and 3D lattices
In this paper we investigate the inverse protein folding (IPF) problem under the Canonical model on 3D and 2D lattices [13, 26]. In this problem, we are given a contact graph G = (V,E) of a protein sequence that is embeddable in a 3D (respectively, 2D) lattice and an integer 1 ≤ K ≤ |V |. The goal is to find an induced subgraph of G of at most K vertices with the maximum number of edges. In thi...
متن کاملGypsum Dissolution Effects on the Performance of a Large Dam (TECHNICAL NOTE)
Upper Gotvand dam is constructed on the Karun River located in the south west of Iran. In this paper, 2D and 3D models of the dam together with the foundation and abutments were constructed and several seepage analyses were carried out. Then the gypsum veins scattered throughout the foundation ground and also the seepage pattern were included in the models, hence the dissolution law of gypsum, ...
متن کاملA Combinatorial Toolbox for Protein Sequence Design and Landscape Analysis in the Grand Canonical Model
In modern biology, one of the most important research problems is to understand how protein sequences fold into their native 3D structures. To investigate this problem at a high level, one wishes to analyze the protein landscapes, i.e., the structures of the space of all protein sequences and their native 3D structures. Perhaps the most basic computational problem at this level is to take a tar...
متن کاملIn Silico Prediction and Docking of Tertiary Structure of Multifunctional Protein X of Hepatitis B Virus
Hepatitis B virus (HBV) infection is a universal health problem and may result into acute, fulminant, chronic hepatitis liver cirrhosis, or hepatocellular carcinoma. Sequence for protein X of HBV was retrieved from Uniprot database. ProtParam from ExPAsy server was used to investigate the physicochemical properties of the protein. Homology modeling was carried out using Phyre2 server, and refin...
متن کاملStudies of protein designability using reduced models
One the most important problems in computational structural biology is protein designability, that is, why protein sequences are not random strings of amino acids but instead show regular patterns that encode protein structures. Many previous studies that have attempted to solve the problem have relied upon reduced models of proteins. In particular, the 2D square and the 3D cubic lattices toget...
متن کامل